perm filename MCPAIN[F77,JMC] blob sn#318048 filedate 1977-11-23 generic text, type C, neo UTF8
COMMENT ⊗   VALID 00002 PAGES
C REC  PAGE   DESCRIPTION
C00001 00001
C00002 00002	.REQUIRE "LSP.PUB[1,CLT]" SOURCE FILE
C00028 ENDMK
C⊗;
.REQUIRE "LSP.PUB[1,CLT]" SOURCE FILE
.turn on "∂,\"
.at "α" ⊂"%Aα%*"⊃
.at "β" ⊂"%Aβ%*"⊃
.portion main
.page←1
[This is a reprint with minor changes of "Correctness of a Compiler for
Arithmetic Expressions" by John McCarthy and James Painter which was
published in %2MATHEMATICAL ASPECTS OF COMPUTER SCIENCE%1 which was Volume
19 of %2Proceedings of Symposia in Applied Mathematics%1 and published
by the American Mathematical Society in 1967].

.cb Correctness of a Compiler for Arithmetic Expressions
.begin center
by
John McCarthy and James Painter
.end

.ss  Introduction.  

This paper contains a proof of the correctness of a simple
compiling algorithm for compiling arithmetic expressions into machine language.

The definition of correctness, the formalism used to express the description of
source language, object language and compiler, and the methods of proof are all
intended to serve as prototypes for the more complicted task of proving the
correctness of usable compilers.  The ultimate goal, as outlined in
references [1], [2], [3] and [4] is to make it possible to use a computer to
check proofs that compilers are correct.

The concepts of abstract syntax, state vector, the use of an interpreter for
defining the semantics of a programming language, and the definition of
correctness of a compiler are all the same as in [3].  The present paper,
however, is the first in which the correctness of a compiler is proved.

The expressions dealt with in this paper are formed from constants and
variables.  The only operation allowed is a binary + although no change
in method would be required to include any other binary operations.  An
example of an expression that can be compiled is

.once indent 8
⊗⊗(x + 3) + (x + (y + 2))⊗

although, because we use abstract syntax, no commitment to a particular
notation is made.

The computer language into which these expressions are compiled is a single
address computer with an accumulator, called ⊗ac, and four instructions:
⊗li (load immediate), ⊗load, ⊗sto (store) and ⊗add.  Note that there are
no jump instructions.  Needless to say, this is a severe restriction on
the generality of our results which we shall overcome in future work.

The compiler produces code that computes the value of the expression being
compiled and leaves this value in the accumulator.  The above expression
is compiled into code which in assembly language might look as follows:
.skip
.begin nofill; narrow 8
.select 2
load∂6x
sto∂6t
li∂63
add∂6t
sto∂6t
load∂6x
sto∂6t + 1
load∂6y
sto∂6t + 2
li∂62
add∂6t + 2
add∂6t + 1
add∂6t
.end

Again because we are using abstract syntax there is no commitment to a
precise form for the object code.


.ss  The source language. 

The abstract analytic syntax of the source expressions is given by the table:
.skip
.begin nofill; narrow 8
%3predicate∂(20)associated functions

.select 2
isconst(e)
isvar(e)
issum(e)∂(20)s1(e)     s2(e)
.end

which asserts that the expressions comprise constants, variables and binary
sums, that the predicates %2isconst, isvar, %1and %2issum %1enable one to
classify each expression and that each sum ⊗e has summands ⊗s1(e) and
⊗s2(e).  

The semantics is given by the formula
.skip
.begin nofill
.tabs 8
\⊗⊗value(e,qx) ← qif isconst(e) qthen val(e) qelse qif isvar(e) qthen c(e,qx)⊗ 
(2.1)
\⊗⊗qelse qif issum(e) qthen value(s1(e),qx) + value(s2(e),qx)⊗ 
.end

where ⊗val(e) gives the numerical value of an expression ⊗e representing
a constant, ⊗c(e,qx) %1gives the value of the variable ⊗e in the state
vector qx and + is some binary operation.  (It is natural to regard +
as an operation that resembles addition of real numbers, but our
results do not depend on this).

For proving the correctness of compilers,
we don't need a synthetic syntax for
source language expressions since the interpreter and compiler
use only the analytic syntax.  However, we shall need the following
induction principle for expressions:

Suppose qF is a predicate applicable to expressions, and suppose that
for all expressions ⊗e, we have
.skip
.begin narrow 8; nofill; select 2;
isconst(e) ⊃ qF(e)  %1and%*
isvar(e) ⊃ qF(e)  %1and%*
issum(e) ∧ qF(s1(e)) ∧ qF(s2(e)) ⊃ qF(e).
.end

%1Then we may conclude that qF%2(e) %1is true for all expressions ⊗e. 

.ss The object language.

We must give both the analytic and synthetic syntaxes for the object language
because the interpreter defining its semantics uses the analytic syntax and
the compiler uses the synthetic syntax.  We may write the analytic and
synthetic syntaxes for instructions in the following table.
.skip
.begin nofill;
.tabs 8,20,32,50; turn on "\";
\%3operation\predicate\analytic operation\synthetic operation

.tabs 9,20,36,54;
.select 2
\li∂(14)%Aa%*\isli(s)\arg(s)\mkli(%Aa%*)
\load∂(14)x\isload(s)\adr(s)\mkload(x)
\sto∂(14)x\issto(s)\adr(s)\mksto(x)
\add∂(14)x\isadd(s)\adr(s)\mkadd(x)
.end

A program is a list of instructions and ⊗null(p) asserts that ⊗p is the null
list.  If the program ⊗p is not null then ⊗first(p) gives the first instruction
and ⊗rest(p) gives the list of remaining instructions.  We shall use the
operation ⊗p1*p2 to denote the program obtained by appending ⊗p2 onto the
end of ⊗p1.  Since we have only one level of list we can identify a single
instruction with a program that has just one instruction.

The synthetic and analytic syntaxes of instructions are related by the 
following.
.skip
.begin nofill;
.tabs 10; turn on "\";
.select 2
\isli(mkli(%Aa%*))
\%Aa%* = arg(mkli(%Aa%*))
(3.1)\isli(s) ⊃ s = mkli(arg(s))
\null(rest(mkli(%Aa%*)))
\isli(s) ⊃ first(s) = s

\isload(mkload(x))
\x = adr(mkload(x))
(3.2)\isload(x) ⊃ x = mkload(adr(x))
\null(rest(mkload(x)))
\isload(s) ⊃ first(s) = s

\issto(mksto(x))
\x = adr(mksto(x))
(3.3)\issto(s) ⊃ s = mksto(adr(s))
\null(rest(mksto(x)))
\issto(s) ⊃ first(s) = s

\isadd(mkadd(x))
\x = adr(mkadd(x))
(3.4)\isadd(s) ⊃ s = mkadd(adr(s))
\null(rest(mkadd(x)))
\isadd(s) ⊃ first(s) = s

(3.5)\¬null(p) ⊃ p = first(p) * rest(p),

(3.6)     ¬null(p1) ∧ null(rest(p1)) ⊃ p1 = first(p1*p2)

(3.7)\null(p1*p2) ≡ null(p1) ∧ null(p2).
.end

The * operation is associative. (The somewhat awkward form of these relations
comes from having a general concatenation operation rather than just an
operation that prefixes a single instruction onto a program.)

A state vector for a machine gives, for each register in the machine, its
contents.  We include the accumulator denoted by ⊗ac as a register.  There
are two functions of state vectors as introduced in [3], namely

1.  %2c(x,qh) %1denotes the value of the contents of register ⊗x in machine
state qh.

2.  %2a(x,%Aa%*,qh) %1denotes the state vector that is obtained from the state
vector qh by changing the contents of register ⊗x to %Aa%* leaving the other
registers unaffected.

These functions satisfy the following relations:
.skip
(3.8)⊗⊗⊗c(x,a(y,%Aa%*,qh)) = qif x = y qthen %Aa%* qelse c(x,qh),⊗

(3.9)⊗⊗⊗a(x,%Aa%*,a(y,%Ab%*,qh)) = qif x = y qthen a(x,%Aa%*,qh) qelse a(y,%Ab%*,a(x,%Aa%*,qh)),⊗ 

(3.10)⊗⊗⊗a(x,c(x,qh),qh) = qh.⊗

Now we can define the semantics of the object language by
.skip
.begin nofill;
.tabs 10
.select 2
\step(x,qh) ← qif isli(s) qthen a(ac,arg(s),qh) qelse qif
\isload(s) qthen a(ac,c(adr(s),qh),qh) qelse qif
(3.11)\issto(s) qthen a(adr(s),c(ac,qh),qh) qelse qif
\isadd(s) qthen a(ac,c(adr(s),qh) + c(ac,qh),qh)
.end

which gives the state vector that results from executing an instruction and

(3.12)⊗⊗⊗outcome(p,qh) ← qif null(p) qthen qh qelse outcome(rest(p),step(first(p),qh))⊗ 

which gives the state vector that results from executing the program ⊗p 
with state vector qh.

The following lemma is left as an exercise for the reader.
.skip
(3.13)⊗⊗⊗outcome(p1*p2,qh) = outcome(p2,outcome(p1,qh))⊗


.ss The compiler.

We shall assume that there is a map giving for each variable in the
expression a location in the main memory of the machine.  %2loc(v)%1
gives this location and we shall assume

(4.1)⊗⊗⊗c(loc(v),qh) = c(v,qx)⊗

as a relation between the state vector qh before the compiled program
starts to act and the state vector qx of the source program.

Now we can write the compiler.  It is

⊗⊗⊗compile(e,t) ← qif isconst(e) qthen mkli(val(e))⊗

(4.2)⊗⊗⊗qelse qif isvar(e) qthen mkload(loc(e,map))⊗

⊗⊗⊗qelse qif issum(e) qthen compile(s1(e),t) * mksto(t) * compile(s2(e),t + 1) * mkadd(t)⊗ 

Here ⊗t is the number of a register such that all variables are stored in 
registers numbered less than ⊗t, so that registers ⊗t and above are available
for temporary storage.

Before we can state our definition of correctness of the compiler, we need a
notion of partial equality for state vectors

⊗⊗⊗qzq1 =%4A%* qzq2⊗

where qzq1 and qzq2 are state vectors and ⊗A is a set of variables means that
corresponding components of qzq1 and qzq2 are equal except possibly for values,
of variables in ⊗A.  Symbolically, %2x %8π %2A ⊃ c(x,qzq1) = c(x,qzq2)%1.  Partial
equality satisfies the following relations:

(4.3)⊗⊗⊗qzq1 = qzq2 %1is equivalent to qzq1 =%4{ }%* qzq2 where { } denotes the
empty set,⊗ 

(4.4)⊗⊗⊗%1if %2A ⊂ B%1 and qzq1 =%4A%1 qzq2 then qzq1 =%4B%1 qzq2,⊗ 

(4.5)⊗⊗⊗%1if qzq1 =%4A%* qzq2, then %2a(x,%Aa%*,qzq1) =%4A-{x} %2a(x,%Aa%*,qzq2),⊗ 

(4.6)⊗⊗⊗%1if %2x ε A %1then %2a(x,%Aa%*,qz) =%4A%2 qz,⊗  

(4.7)⊗⊗⊗%1if qzq1 =%4A%1 qzq2 and qzq2 =%4B%1 qzq3, then qzq1 =%4A∪B%1 qzq3.⊗%1

In our case we need a specialization of this notation and will use
.skip
.begin nofill
.tabs 8
\⊗⊗qzq1 =%4t%1 qzq2 to denote qzq1 =%4{x|x≥t} %1qzq2⊗ 
and
\⊗⊗qzq1 =%4ac%1 qzq2 to denote qzq1 =%4{ac} %1qzq2⊗ 
and
\⊗⊗qzq1 =%4t,ac%1 qzq2 to denote aqzq1 =%4{x|x = ac ∨ x ≥ t} %1qzq2. ⊗
.end

The correctness of the compiler is stated in

THEOREM 1.  %2If qh and qx are machine and source language state vectors
respectively such that

(4.8) c(loc(v),qh) = c(v,qx), then
.once indent 25; preface 0
outcome(compile(e,t),qh) =%4t%2 a(ac,value(e,qx),qh).

%1It states that the result of running the compiled program is to put the
value of the expression compiled into the accumulator.  No registers
except the accumulator and those with addresses ≥ ⊗t are affected.


.ss Proof of Theorem 1.
.break
The proof is accomplished by an induction on the expression ⊗e being compiled.
We prove it first for constants, then for variables, and then for sums on the
induction hypothesis that it is true for the summands.  Thus there are three
cases.
.skip
.begin nofill
I.  %2isconst(e).  %1We have∂(50)%3Justification%1

.tabs 25,55
%2outcome(compile(e,t),qh)\= outcome(mkli(val(e)),qh)\4.2
\= step(mkli(val(e)),qh)\3.12, 3.1
\= a(ac,arg(mkli(val(e))),qh)\3.1, 3.11
\= a(ac,val(e),qh)\3.1
\= a(ac,value(e,qx),qh)\2.1
\=%4t%2 a(ac,value(e,qx),qh).\4.3, 4.4
.skip 2
%1II.  %2isvar(e).  %1We have

.tabs 21,55
%2outcome(compile(e,t),qh)\= outcome(mkload(loc(e)),qh)\4.2
\= a(ac,c(adr(mkload(loc(e))),qh,qh)\3.12, 3.2, 3.11
\= a(ac,c(loc(e),qh),qh)\3.2
\= a(ac,c(e,qx),qh)\4.1
\= a(ac,value(e,qx),qh)\2.1
\=%4t%2 a(ac,value(e,qx),qh).\4.3, 4.4
.skip 2
%1III.  %2issum(e).  %1In this case, we first write

%2outcome(compile(e,t),qh)
∂4= outcome(compile(s1(e),t) * mksto(t)
∂(20)* compile(s2(e),t + 1) * mkadd(t),qh)\4.2
∂4= outcome(mkadd(t),outcome(compile(s2(e),t + 1),
∂(10)outcome(mksto(t),outcome(compile(s1(e),t),qh)))\3.13
.end

%1using the relation between concatenating programs and composing the
functions they represent.  Now we introduce some notation.  Let
.skip
.begin nofill
.tabs 20
\%2v = value(e,qx),
\vq1 = value(s1(e),qx),
\vq2 = value(s2(e),qx),
.end

%1so that %2v = vq1 + vq2%1.  Further let
.skip
.begin nofill
.tabs 15
\%2qzq1 = outcome(compile(s1(e),t),qh),
\qzq2 = outcome(mksto(t),qzq1),
\qzq3 = outcome(compile(s2(e),t + 1),qzq2),
\qzq4 = outcome(mkadd(t),qzq3)
.end

%1so that %2qzq4 = outcome(compile(e,t),qh)%1, and the statement to be proved
becomes

⊗⊗⊗qzq4 =%4t%2 a(ac,v,qh).⊗ 

We have
.skip
.begin nofill; tabs 8,10,40
%2\qzq1\= outcome(compile(s1(e),t),qh)
\\=%4t%2 a(ac,vq1,qh)__\%1Induction Hypothesis

and

\%2c(ac,qzq1) = vq1.\3.8

%1Now%2
\qzq2\= outcome(mksto(t),qzq1)
\\= a(t,c(ac,qzq1),qzq1)\3.12, 3.3, 3.11
\\= a(t,vq1,qzq1)\%1Substitution%2
\\=%4t+1%2 a(t,vq1,a(ac,vq1,qh))\4.5
\\=%4t+1,ac%2 a(t,vq1,qh)\4.6, 3.9
%1and%2
\c(t,qzq2) = vq1.\3.8
%1Next%2
\qzq3\= outcome(compile(s2(e),t + 1),qzq2)
\\=%4t+1%2 a(ac,vq2,qzq2).
.end

.next page
%1Here we use the induction hypothesis again to assume that ⊗s2(e) is compiled
correctly.  In order to apply it, we need %2c(loc(v),qzq2) = c(v,qx) %1 for
each variable ⊗v which is proved as follows:
.skip
.begin narrow 8; nofill
.tabs 15
%2c(loc(v),qzq2)\= c(loc(v),a(t,vq1,qh)) %1since %2loc(v) < t
\\= c(loc(v),qh) %1for the same reason%2
\\= c(v,qx) %1by the hypothesis of the theorem.
.end

Now we can continue with
.skip
.begin nofill; tabs 8,10,40
%2\qzq3 = %4t+1%2 a(ac,vq2,a(t,vq1,qh))\3.9, 4.5
%1and%2
\c(ac,qzq3) = vq2 %1and %2c(t,qzq3) = vq1.\3.8
%1Finally,%2
\qzq4\= outcome(mkadd(t),qzq3)
\\= a(ac,c(t,qzq3) + c(ac,qzq3),qzq3)\3.12, 3.4, 3.11
\\= a(ac,v,qzq3)\%1Definition of ⊗v, substitution%2
\\=%4t+1%2 a(ac,v,a(ac,vq2,a(t,vq1,qh)))\4.5
\\=%4t+1%2 a(ac,v,a(t,vq1,qh))\3.9
\\=%4t%2 a(ac,v,qh).\3.9, 4.6, 4.7
.end

%1This concludes the proof.

.ss Remarks.
.break
The problem of the relations between source language and object language
arithmetic is dealt with here by assuming that the + signs in formulas
(2.1) and (3.11) which define the semantics of the source and object
languages represent the same operation.  Theorem 1 does not depend on any
properties of this operation, not even commutativity or associativity.

The proof is entirely straightforward once the necessary machinery has been
created.  Additional operations such as subtraction, multiplication and
division could be added without essential change in the proof.

For example, to put multiplication into the system the following changes would
be required.

1.  Add ⊗isprod(e), and ⊗p1(e), and ⊗p1(e) to the abstract syntax of the source
language.

2.  Add a term

⊗⊗⊗qif isprod(e) qthen value(p1(e),qx) %8x%2 value(p2(e),qx)⊗ %1

to Equation (2.1).

3.  Add

⊗⊗⊗isprod(e) ∧ qF(p1(e)) ∧ qF(p2(e)) ⊃ qF(e)⊗ %1

to the hypotheses of the source language induction principle.

4.  Add an instruction ⊗mul ⊗x and the three syntactical functions ⊗ismul(s), 
⊗adr(s), ⊗mkmul(x) to the abstract syntax of the object language together
with the necessary relations among them.

5.  Add to the definition (3.11) of %2step%1 a term

⊗⊗⊗qelse qif ismul(s) qthen a(ac,c(adr(s),qh) %8x %2c(ac,qh),qh). ⊗ %1

6.  Add to the compiler a term

.once indent 8
%2qif isprod(e) qthen compile(p1(e),t) * mksto(t)
.begin indent 20; preface 0
* compile(p2(e),t + 1) * mkmul(t).
.end
.break
%17.  Add to the proof a case ⊗isprod(e) which parallels the case ⊗issum(e) exactly.

The following other extensions are contemplated.

.begin nofill; narrow 8
1.  Variable length sums.
2.  Sequences of assignment statements.
3.  Conditional expressions.
4.  %3go to%1 statements in the source language.
.end

In order to make these extensions a complete revision of the formalism will be
required.
1977 note: These and more extensions were made in Painter's thesis [5].
.skip 2
.once center
%3References%1
.break
.begin indent 4
1.  J. McCarthy, %2Computer programs for checking mathematical proofs, %1
Proc. Sympos. Pure Math. Vol. 5, Amer. Math. Soc., Providence, R. I., 1962,
pp. 219-227.

2.  -----------, "A basis for a mathematical theory of computation" in
%2Computer programming and formal systems, %1edited by P. Braffort and D.
Hershberg, North-Holland, Amsterdam, 1963.

3.  -----------, %2Towards a mathematical theory of computation, %1Proc.
Internat. Congr. on Information Processing, 1962.

4.  -----------, %2A formal description of a subset of Algol, %1Proc. Conf.
on Formal Language Description Languages, Vienna, 1964.

5. J. Painter, %2Semantic correctness of a compiler for an Algol-like
language, %1Ph.D. thesis, Stanford University, Stanford, California, 1967.
.end